图像分类模型通常会学会根据输入功能与培训数据中输出类之间的无关共发生进行预测类。我们称不需要的相关性为“数据偏见”,视觉特征导致数据偏见为“偏见因素”。在没有人类干预的情况下自动识别和减轻偏见是一个挑战。因此,我们进行了一项设计研究,以找到人类的循环解决方案。首先,我们确定了用三个专家捕获图像分类模型的偏差缓解过程的用户任务。然后,为了支持任务,我们开发了一个名为DASH的视觉分析系统,该系统允许用户在视觉上识别偏见因素,使用最先进的图像到图像到图像转换模型迭代生成合成图像,并监督改善分类精度的模型培训过程。我们对十名参与者的定量评估和定性研究证明了破折号的实用性,并为将来的工作提供了教训。
translated by 谷歌翻译
Tweedie分布是指数色散模型的特殊情况,它通常用于古典统计作为广义线性模型的分布。在这里,我们揭示了Tweedie发行版也在现代深度学习时代发挥关键作用,导致分布独立的自我监督图像去噪公式,没有清洁参考图像。具体地,通过与最近的噪声2Score自我监督的图像去噪方法和旋转点分布的鞍点近似来组合,我们可以提供一种可以用于大类噪声分布的一般封闭式去噪公式,而不知道底层噪声分布。与原始噪声2Score类似,新方法由两个连续的步骤组成:使用扰动噪声图像的分数匹配,然后是通过分布无关的Tweedie公式的闭合形式图像去噪公式。这还提出了一种系统算法来估计给定嘈杂的图像数据集的噪声模型和噪声参数。通过广泛的实验,我们证明了所提出的方法可以准确地估计噪声模型和参数,并在基准数据集和现实世界数据集中提供最先进的自我监督图像去噪表现。
translated by 谷歌翻译
现有的神经样式传输方法需要参考样式图像来将样式图像的纹理信息传输到内容图像。然而,在许多实际情况中,用户可能没有参考样式图像,但仍然有兴趣通过想象它们来传输样式。为了处理此类应用程序,我们提出了一个新的框架,它可以实现样式转移`没有'风格图像,但仅使用所需风格的文本描述。使用预先训练的文本图像嵌入模型的剪辑,我们仅通过单个文本条件展示了内容图像样式的调制。具体而言,我们提出了一种针对现实纹理传输的多视图增强的修补程序文本图像匹配丢失。广泛的实验结果证实了具有反映语义查询文本的现实纹理的成功图像风格转移。
translated by 谷歌翻译
Several leading methods on public benchmarks for depth-from-stereo rely on memory-demanding 4D cost volumes and computationally intensive 3D convolutions for feature matching. We suggest a new way to process the 4D cost volume where we merge two different concepts in one deeply integrated framework to achieve a symbiotic relationship. A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs. However, instead of predicting depth directly from image features, it provides additional context to resolve ambiguities during pixel matching. More technically, the processing of the 4D cost volume is separated into a 2D propagation and a 3D propagation part. Starting from feature maps of the left image, the 2D propagation assists the 3D propagation part of the cost volume at different layers by adding visual features to the geometric context. By combining both parts, we can safely reduce the scale of 3D convolution layers in the matching part without sacrificing accuracy. Experiments demonstrate that our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method. Furthermore, we notice that the coupling of image and matching-volume improves fine-scale details as demonstrated by our qualitative analysis.
translated by 谷歌翻译
Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability. However, it ignores heterogeneous sensitivity to the impact of quantization errors across the layers, resulting in sub-optimal inference accuracy. This work proposes a novel neural architecture search called neural channel expansion that adjusts the network structure to alleviate accuracy degradation from ultra-low uniform-precision quantization. The proposed method selectively expands channels for the quantization sensitive layers while satisfying hardware constraints (e.g., FLOPs, PARAMs). Based on in-depth analysis and experiments, we demonstrate that the proposed method can adapt several popular networks channels to achieve superior 2-bit quantization accuracy on CIFAR10 and ImageNet. In particular, we achieve the best-to-date Top-1/Top-5 accuracy for 2-bit ResNet50 with smaller FLOPs and the parameter size.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Cartoonization is a task that renders natural photos into cartoon styles. Previous deep cartoonization methods only have focused on end-to-end translation, which may hinder editability. Instead, we propose a novel solution with editing features of texture and color based on the cartoon creation process. To do that, we design a model architecture to have separate decoders, texture and color, to decouple these attributes. In the texture decoder, we propose a texture controller, which enables a user to control stroke style and abstraction to generate diverse cartoon textures. We also introduce an HSV color augmentation to induce the networks to generate diverse and controllable color translation. To the best of our knowledge, our work is the first deep approach to control the cartoonization at inference while showing profound quality improvement over to baselines.
translated by 谷歌翻译
Harmonic functions are abundant in nature, appearing in limiting cases of Maxwell's, Navier-Stokes equations, the heat and the wave equation. Consequently, there are many applications of harmonic functions, spanning applications from industrial process optimisation to robotic path planning and the calculation of first exit times of random walks. Despite their ubiquity and relevance, there have been few attempts to develop effective means of representing harmonic functions in the context of machine learning architectures, either in machine learning on classical computers, or in the nascent field of quantum machine learning. Architectures which impose or encourage an inductive bias towards harmonic functions would facilitate data-driven modelling and the solution of inverse problems in a range of applications. For classical neural networks, it has already been established how leveraging inductive biases can in general lead to improved performance of learning algorithms. The introduction of such inductive biases within a quantum machine learning setting is instead still in its nascent stages. In this work, we derive exactly-harmonic (conventional- and quantum-) neural networks in two dimensions for simply-connected domains by leveraging the characteristics of holomorphic complex functions. We then demonstrate how these can be approximately extended to multiply-connected two-dimensional domains using techniques inspired by domain decomposition in physics-informed neural networks. We further provide architectures and training protocols to effectively impose approximately harmonic constraints in three dimensions and higher, and as a corollary we report divergence-free network architectures in arbitrary dimensions. Our approaches are demonstrated with applications to heat transfer, electrostatics and robot navigation, with comparisons to physics-informed neural networks included.
translated by 谷歌翻译
Brain-computer interface (BCI) uses brain signals to communicate with external devices without actual control. Particularly, BCI is one of the interfaces for controlling the robotic arm. In this study, we propose a knowledge distillation-based framework to manipulate robotic arm through hybrid paradigm induced EEG signals for practical use. The teacher model is designed to decode input data hierarchically and transfer knowledge to student model. To this end, soft labels and distillation loss functions are applied to the student model training. According to experimental results, student model achieved the best performance among the singular architecture-based methods. It is confirmed that using hierarchical models and knowledge distillation, the performance of a simple architecture can be improved. Since it is uncertain what knowledge is transferred, it is important to clarify this part in future studies.
translated by 谷歌翻译
Score-based generative models are shown to achieve remarkable empirical performances in various applications such as image generation and audio synthesis. However, a theoretical understanding of score-based diffusion models is still incomplete. Recently, Song et al. showed that the training objective of score-based generative models is equivalent to minimizing the Kullback-Leibler divergence of the generated distribution from the data distribution. In this work, we show that score-based models also minimize the Wasserstein distance between them under suitable assumptions on the model. Specifically, we prove that the Wasserstein distance is upper bounded by the square root of the objective function up to multiplicative constants and a fixed constant offset. Our proof is based on a novel application of the theory of optimal transport, which can be of independent interest to the society. Our numerical experiments support our findings. By analyzing our upper bounds, we provide a few techniques to obtain tighter upper bounds.
translated by 谷歌翻译